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NORMALIZED  TIME  AND  ITS  USE  IN 
ARCHITECTURAL  DESIGN 

S.  Ho,  T.  Holman,1  L.  Snyder 
University  of  Washington, 

Seattle,  Washington 

Introduction 

Building  better  and  faster  computers  is  always  the 
goal  of  computer  design.  To  do  this,  designers  often 
propose  modifications  and  improvements  to  com¬ 
puters.  Typically,  these  so-called  improvements 
must  also  carry  some  cost,  in  additional  size  or 
complexity.  All  too  often,  only  the  benefits  and 
not  the  costs  are  the  subject  of  analysis.  As  an  ex¬ 
ample,  the  Berkeley  RISC  design  [Patterson]  had 
a  reduced  instruction  set,  as  well  as  register  win¬ 
dows.  The  extra  cost  of  the  register  windows  was 
offset  by  the  smaller  control.  But  what  then,  if  we 
had  allocated  this  cost  to,  say,  a  carry-lookahead 
adder,  or  some  other  part?  Would  this  have  been 
a  wiser  choice? 

Holman  [1988]  addressed  this  problem.  The 
method  of  normalized  analysis  is  a  way  of  fairly 
resolving  both  the  costs  and  benefits  of  a  modifi¬ 
cation.  [Holman  1989]  A  concrete  example  of  such 
analysis  is  to  ask: 

Do  programs  run  faster  on  (parallel) 
computers  when  floating-point  coproces¬ 
sors  are  installed,  or  when  the  equivalent 
amount  of  hardware  is  used  instead  for  ad¬ 
ditional  processor  elements? 

We  may  repeat  this  question  for  each  additional 
proposed  modification,  such  as  multipliers,  shifters, 
etc. 

This  analysis  allows  determining  whether  a  par¬ 
ticular  modification  is,  individually,  cost-effective. 
In  real  designs,  though,  the  number  of  potential 
modifications  is  not  one,  but  many.  Further,  these 
changes  may  interact  variously.  A  multiplier  may 
obviate  the  need  for  a  shifter,  a  shifter  may  dupli¬ 
cate  part  of  a  floating  point  unit,  and  so  forth.  We 
need  an  algorithm  for  taking  the  varied  set  of  mod¬ 
ifications,  and  choosing  that  set  which,  working  in 
concert,  provides  the  best  cost-benefit  ratio.  We 
first  extend  the  normalized  analysis  to  the  more 
understandable  concept  of  normalized  time.  We 
then  examine  the  effect  of  selecting  multiple  mod¬ 
ifications  with  the  simplest  algorithm,  the  greedy 
algorithm. 

Model 

First,  we  must  define  our  model  We  start  with 
some  base  architecture,  and  then  evaluate  the  time 
and  cost,  on  a  fixed  problem.'  The  normalized  time 
is  then  their  product. 

1  Currently  with  Sun  Microeystenu,  Mountain  View,  CA 


Definition  1  Fix  the  computation.  Then  define 

T0  =  Time 

Co  =  Cost 

TqCq  =  Normalized  Time. 

The  subscript  zero  denotes  the  base  architecture. 
The  units ,  e.g.  sm2,  are  irrelevant,  as  we  are  only 
making  comparisons  here. 

To  the  base  architecture,  we  then  add  modifi¬ 
cations.  We  stipulate  that,  akin  to  Amdahl’s  law 
[Amdahl],  some  fraction  /  is  affected  by  the  change, 
speeding  it  up  by  some  factor  S,  and  the  rest  is  left 
undisturbed.  We  also  stipulate  that  the  change  in¬ 
creases  the  cost  by  some  fraction  c. 

As  an  example,  a  floating  point  coprocessor 
might  produce  a  speedup  of  a  factor  of  twelve,  but 
only  on  sixteen  percent  of  all  instructions.  It  might 
also  increase  the  cost,  measured  as  chip  area,  by 
thirty-eight  percent.  (These  figures  are  for  rela¬ 
tional  operations  in  a  bitonic  sort  on  the  Transputer 
T800.  [Holman  1989]) 

Proposition  1  .4  modification  m  affecting  a  frac¬ 
tion  f,  with  speedup  S  and  cost  c  obeys 

t  =  r0(i -f+jj 

C  =  Co(l  +  c). 

The  comparison  is  then  between  the  normalized 
times.  In  our  floating  point  example  above,  we  find 
the  normalized  time  is  1.18  times  larger  with  the 
coprocessor  than  without.  The  coprocessor  is  not 
used  enough,  in  the  relational  operations  of  this 
case,  to  be  worthwhile,  as  the  cost  exceeds  the  ben¬ 
efit. 

We  can  combine  modifications  by  summing  the 
time  and  cost.  For  simplicity,  let  us  assume  that  the 
modifications  do  not  interact.  Interacting  combina¬ 
tions  would  have  a  speedup  term  for  each  possible 
combination,  but  would  otherwise  be  similar. 

Proposition  2  For  a  set  of  noninteracting  modi¬ 
fications  m,,  given  fi,  a,  Si,  we  have 


Two  Are  Better  Than  One 


Before  we  consider  the  greedy  algorithm,  let  us 
first  examine  the  effect  of  the  simplest  combina¬ 
tion:  two  noninteracting  modifications  combined. 
In  this  case,  the  cost  is  less  than  the  product  of  the 


Base  Architecture 

m  2 

Figure  1:  Representation  of  cross  terms 


algorithm  greedy 
X  -  0 
do 

for  each  i,  m,  €  M 
compute  TC 

if  TC  >  ToCo  then  X  *—  X  U  {m,} 
Base  <—  Base  U  A' 
while  A  5^  0 


Figure  2:  The  Greedy  Algorithm 


two  costs,  relative  to  the  base,  individually.  This  is 
expressed  by  the  inequality 

1  +  Cx  -f  C2  <  (1  +  Ci)(l  +  C2). 

Graphically,  this  is  demonstrated  in  Figu-e  1.  The 
product  overestimates  the  cost  by  the  dashed  in¬ 
teraction  term.  A  similar  relationship  holds  for  the 
time.  Thus,  we  have 

Theorem  1  The  combined  normalized  time  of  two 
noninteracting  modifications  is  less  than  the  prod¬ 
uct  of  their  separate  normalized  times. 

T12C12  <  TjCi  T2C2 
TqCo  ~  To  Co  ToCo 
The  Greedy  Algorithm 

Now  we  may  consider  the  greedy  algorithm,  illus¬ 
trated  in  Figure  2,  for  minimizing  normalized  time. 
The  algorithm  considers  each  modification  in  turn. 
All  comparing  favorably  are  added,  becoming  part 
of  the  new  base  architecture,  and  the  process  re¬ 
peats  until  no  (individually)  favorable  modifica¬ 
tions  remain. 

Unfortunately,  the  greedy  algorithm  ignores  the 
cress  term  described  above. 

Theorem  2  The  greedy  algorithm  is  suboptimal. 

As  an  example,  take  f,  =  /;  =  1/2,  ci  —  C2  =  1, 
and  Si  =  S2  =  5.  VVe  then  have 

T[Ci  _  T2C2  _  6 
ToCo  ToCo  5 
T2C12  _  3 
T0C0  5 


Neither  mi  nor  m2,  taken  alone,  is  worthwhile. 
The  algorithm  will  leave  the  base  architecture  un¬ 
touched,  yet  the  optimal  set  is  both  of  {mi,m2}. 

Nevertheless,  the  greedy  algorithm  is  conserva¬ 
tive,  in  the  sense  that  every  greedily  chosen  modi¬ 
fication  is  also  a  member  of  the  optimal  set.  This 
is  because  the  cross  term  is  always  positive. 

Theorem  3  The  greedy  algorithm  is  conservative. 

Proof:  Let  G  be  the  greedily  chosen  set,  and  5  the 
optimal  set  of  modifications.  Consider  G  —  S.  If 
nonempty,  it  must  have  normalized  time  less  than 
one.  Then,  Su(G  —  S)  must  have  normalized  time 
better  than  5,  which  is  optimal,  a  contradiction. 
Therefore  G  —  S  =  0,  or  G  C  5. 

Conclusion 

1 

VVe  began  with  the  idea  of  normalized  analysis: 
that  the  cost  of  a  modification  is  just  as  impor¬ 
tant  as  its  benefits.  We  have  extended  the  model 
of  normalized  time  to  multiple  groups  of  modifica¬ 
tions.  We  then  analyzed  the  results  of  the  simplest, 
greedy,  algorithm  as  a  tool  for  selecting  the  best  set 
of  modifications. 

In  doing  so,  we  find  that  the  greedy  algorithm  is 
provably  a  suboptimal  algorithm,  even  for  the  very 
simple  types  of  modifications  considered  here.  Nev¬ 
ertheless,  since  it  is  a  conservative  algorithm,  it  is 
still  useful  as  a  starting  point  for  further  selection. 
By  running  the  fast  and  simple  greedy  algorithm, 
we  can  select  many  of  the  same  modifications  that 
would  be  found  by  any  better  algorithm,  thus  re¬ 
ducing  the  number  of  choices  that  the  other  algo¬ 
rithm  must  make. 

With  this  theoretical  basis,  and  the  results  of  an 
initial  algorithm,  it  may  now  be  possible  for  com¬ 
puter  designers  to  select,  in  a  more  analytical  man¬ 
ner,  which  of  the  multitude  of  potential  modifica¬ 
tions  to  include  in  a  computer  system.  ^ 
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